Sensitive sequence comparison as protein function predictor.
نویسندگان
چکیده
Protein function assignments based on postulated homology as recognized by high sequence similarity are used routinely in genome analysis. Improvements in sensitivity of sequence comparison algorithms got to the point, that proteins with previously undetectable sequence similarity, such as for instance 10-15% of identical residues, sometimes can be classified as similar. What is the relation between such proteins? Is it possible that they are homologous? What is the practical significance of detecting such similarities? A simplified analysis of the relation between sequence similarity and function similarity is presented here for the well-characterized proteins from the E. coli genome. Using a simple measure of functional similarity based on E.C. classification of enzymes, it is shown that it correlates well with sequence similarity measured by statistical significance of the alignment score. Proteins, similar by this standard, even in cases of low sequence identity, have a much larger chance of having similar function than the randomly chosen protein pairs. Interesting exceptions to these rules are discussed.
منابع مشابه
The Comparison of the Effectiveness of a Modified Conformation Sensitive Gel Electrophoresis with Denaturing High Performance Liquid Chromatography
Background: Several methods have been developed for detection of sequence variation in genes and each has its advantages and disadvantages. A disadvantage of them is that the simpler, cost-effective methods are commonly perceived as being less sensitive in their detection of sequence variation, whereas those with proven sensitivity have a requirement for complex or expensive laboratory equipmen...
متن کاملIdentifying sequence regions undergoing conformational change via predicted continuum secondary structure
MOTIVATION Conformational flexibility is essential to the function of many proteins, e.g. catalytic activity. To assist efforts in determining and exploring the functional properties of a protein, it is desirable to automatically identify regions that are prone to undergo conformational changes. It was recently shown that a probabilistic predictor of continuum secondary structure is more accura...
متن کاملProtein profiling and analysis of drug sensitive and multidrug resistant isolates of Mycobacterium tuberculosis by native polyacrylamide gel electrophoresis and mass spectrometry
Introduction: Tuberculosis (TB) remains a deadly infectious disease despite all the efforts to reduce its incidence. Spread of multidrug resistant TB has seriously undermined the efforts to control the disease globally. In this study protein expression profile of MDR and sensitive isolates of MTB were analyzed and compared in order to identify proteins, which could be used in prevention, diagno...
متن کاملSeismic Data Forecasting: A Sequence Prediction or a Sequence Recognition Task
In this paper, we have tried to predict earthquake events in a cluster of seismic data on pacific ring of fire, using multivariate adaptive regression splines (MARS). The model is employed as either a predictor for a sequence prediction task, or a binary classifier for a sequence recognition problem, which could alternatively help to predict an event. Here, we explain that sequence prediction/r...
متن کاملSequential Prediction of Individual Sequences Under General Loss Functions
We consider adaptive sequential prediction of arbitrary binary sequences when the performance is evaluated using a general loss function. The goal is to predict on each individual sequence nearly as well as the best prediction strategy in a given comparison class of (possibly adaptive) prediction strategies, called experts. By using a general loss function, we generalize previous work on univer...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
دوره شماره
صفحات -
تاریخ انتشار 2000